Introduction

In this guide, we’ll walk you through the installation of the BioLizardStyleR package and provide a brief overview of its functions using a practical example. The BioLizardStyleR package helps you make ggplot2 plots in the BioLizard style. It includes an integrated font installer, a dedicated ggplot2 theme, custom color functions, and a tool to add a BioLizard footer and export the plot.

1. Installation

To install the BioLizardStyleR package directly from GitHub, you will need to utilize the devtools package. The package requires our signature font “Lato”. If the font is not yet installed, installation will be attempted automatically when you load the library.

if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}

# make sure to look up BioConductor packages (used in the vignette)
setRepositories(ind = c(1:6, 8))

devtools::install_github("lizard-bio/nature-grade-visualization-playground", subdir="BioLizardStyleR")

For building the vignette, you also need the pasilla dataset. In case of issues with the installation of this package, you might have to install libbz2 (libbz2-dev (deb), libbz2-devel (rpm), bzip2 (brew)).

library(BioLizardStyleR)

If automatic font installation fails on your system, you can install the font manually using the following steps:

  1. Download Fonts: Download the font files from the GitHub repository Nature Grade Visualization Playground.
  2. Install Fonts: Install the downloaded fonts on your system with your favorite font manager. Double-clicking the .ttf file will already open your font manager in many cases.

2. Crafting a BioLizard-Styled ggplot

Transforming your ggplot to match the BioLizard style involves a three-step process. Let’s illustrate this on a basic ggplot.

library(ggplot2)
data("mtcars")
mtcars$gear <- factor(mtcars$gear, levels = c(3, 4, 5), ordered = TRUE)

testplot <- ggplot2::ggplot(data = mtcars, ggplot2::aes(x = hp, y = mpg)) + 
  ggplot2::geom_point(ggplot2::aes(color = gear),size=3) +  
  ggplot2::labs(title = "Miles per Gallon vs. Horsepower",  
                 x = "Horsepower",  
                 y = "Miles per Gallon",  
                 color = "Gears")
testplot

2.1 Applying the Biolizard theme

In the initial step, we implement the biolizard theme through the lizard_style() function. This not only modifies various theme settings but also adjusts the font. An important note here is that the lizard theme can only act as a starting template! Depending on your specific plot, title length, variables in use, and other factors, you might need to make further theme adjustments.

testplot <- testplot + lizard_style()
testplot

Tip: you can use theme_set(lizard_style()) at the beginning of your script or report to set the theme to lizard_style() in all plots, without having to add + lizard_style() to each plot individually. Te reset the theme to the ggplot2 default, you can call reset_theme_settings().

2.2 Adding a BioLizard color palette

Once you’ve styled your plot, you can enhance it with a ‘BioLizard’ color palette. You have the option to choose from qualitative, sequential, or divergent palettes tailored for either discrete or continuous variables. These palettes incorporate the distinct hues of the BioLizard house colors, while ensuring they are color-blind friendly and perceptually uniform. For a deeper understanding of each palette, it’s recommended to consult the documentation for the respective function.

The coloring functions within the package adhere to a flexible structure: scale_color/fill_biolizard. The type (discrete/continuous) and the scheme (qualitative/sequential/divergent) must be determined as function parameters.

data("mtcars")
mtcars$gear <- as.factor(mtcars$gear)

# Base plot
testplot <- ggplot2::ggplot(data = mtcars, ggplot2::aes(x = hp, y = mpg)) +
  ggplot2::geom_point(aes(color = gear), size=3) +
  ggplot2::labs(
    title = "Miles per Gallon vs. Horsepower",
    x = "Horsepower",
    y = "Miles per Gallon",
    color = "Gears"
  )

# Applying the qualitative color scale
testplot_qualitative <- testplot + 
  scale_color_biolizard(type = "discrete", scheme = "qualitative") + 
  lizard_style()
testplot_qualitative


# Applying the sequential color scales (assuming you have a different dataset or aesthetic)
## viridis-like scale
testplot_sequential <- testplot + 
  scale_color_biolizard(type = "discrete", scheme = "l_viridis") + 
  lizard_style()
testplot_sequential


## sequential scale inspired by BioLizard green
testplot_sequential <- testplot + 
  scale_color_biolizard(type = "discrete", scheme = "sequential") + 
  lizard_style()
testplot_sequential


# Applying the divergent color scale (assuming you have a different dataset or aesthetic)
testplot_divergent <- testplot + 
  scale_color_biolizard(type = "discrete", scheme = "divergent") + 
  lizard_style()
testplot_divergent

It is possible to reverse the color scales by setting reverse = TRUE

testplot_divergent <- testplot + 
  scale_color_biolizard(type = "discrete", scheme = "divergent", reverse = TRUE) +
  lizard_style()
testplot_divergent

The discrete color palettes can also be called directly, in case you would like to use them outside ggplot. The three base colors can also be used directly as blz_green, blz_blue and blz_yellow.

#use color palette
mycolors <- biolizard_pal_sequential(3, reverse = TRUE)
gearcolors <- mycolors[as.factor(mtcars$gear)]

graphics::plot(x = mtcars$hp, y = mtcars$mpg, pch = 19, col = gearcolors, family = "Lato")
graphics::legend("topright", legend = paste("Gears", 3:5), col = mycolors, pch = 19, bty = "n")

#use BLZ base colors
mycolors <- c(blz_green, blz_blue, blz_yellow)
gearcolors <- mycolors[as.factor(mtcars$gear)]

graphics::plot(x = mtcars$hp, y = mtcars$mpg, pch = 19, col = gearcolors, family = "Lato")
graphics::legend("topright", legend = paste("Gears", 3:5), col = mycolors, pch = 19, bty = "n")

2.4 Reset ggplot2 defaults

Note that calling lizard_style() will overwrite the default colors for points, lines, boxplots, areas, etc.. . To reset them to the ggplot2 defaults, you can either restart the R session or use reset_colors().

In the plot below, the points are shown in BioLizard green. Using reset_colors() will revert these changes:

p <- ggplot2::ggplot(mtcars, ggplot2::aes(mpg, disp)) + 
  ggplot2::geom_point()
p

reset_colors()
p

3. Examples

3.1 Simple bar chart

x = 1:8
y = 1:8

df <- data.frame(x=x, y=y)
ggplot2::ggplot(df, ggplot2::aes(x=x, y=y, fill = factor(x))) +
  ggplot2::geom_col() +
  scale_fill_biolizard() +
  lizard_style()

3.2 Simple polygon plot

ids <- factor(c("1.1", "2.1", "1.2", "2.2", "1.3", "2.3"))
values <- data.frame(
  id = ids,
  value = c(3, 3.1, 3.1, 3.2, 3.15, 3.5)
)
positions <- data.frame(
  id = rep(ids, each = 4),
  x = c(2, 1, 1.1, 2.2, 1, 0, 0.3, 1.1, 2.2, 1.1, 1.2, 2.5, 1.1, 0.3,
        0.5, 1.2, 2.5, 1.2, 1.3, 2.7, 1.2, 0.5, 0.6, 1.3),
  y = c(-0.5, 0, 1, 0.5, 0, 0.5, 1.5, 1, 0.5, 1, 2.1, 1.7, 1, 1.5,
        2.2, 2.1, 1.7, 2.1, 3.2, 2.8, 2.1, 2.2, 3.3, 3.2)
)

datapoly <- merge(values, positions, by = c("id"))

ggplot2::ggplot(datapoly, ggplot2::aes(x = x, y = y)) +
  ggplot2::geom_polygon(ggplot2::aes(group = id, fill = id)) +
  scale_fill_biolizard(type = "discrete", scheme = "sequential") +
  lizard_style()

3.3 Simple box plot

ggplot2::ggplot(data = mtcars, ggplot2::aes(x = as.factor(gear), y = mpg)) +
  ggplot2::geom_boxplot() +
  ggplot2::labs(
    title = "Miles per Gallon vs. Gears",
    x = "Gears",
    y = "Miles per Gallon"
  ) +
  lizard_style()

A violin plot is similar to a boxplot, but captures more information on the data distribution:

ggplot2::ggplot(data = mtcars, ggplot2::aes(x = as.factor(gear), y = mpg)) +
  ggplot2::geom_violin() +
  ggplot2::labs(
    title = "Miles per Gallon vs. Gears",
    x = "Gears",
    y = "Miles per Gallon"
  ) +
  lizard_style()

3.4 Simple density plot

ggplot2::ggplot(data = mtcars, ggplot2::aes(x = mpg)) +
  ggplot2::geom_density() +
  lizard_style()

3.5 Simple scatterplot

ggplot2::ggplot(data = mtcars, ggplot2::aes(x = hp, y = mpg)) +
  ggplot2::geom_point(ggplot2::aes(color= mpg)) +
  ggplot2::geom_smooth() +
  ggplot2::labs(
    title = "Miles per Gallon vs. horsepower",
    x = "Horsepower",
    y = "Miles per Gallon",
    color = "Miles per Gallon",
    fill = "Miles per Gallon"
  ) +
  scale_color_biolizard(type = "continuous", scheme = "l_viridis") +
  lizard_style()
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

3.6 Facets

Faceting is a great way to lay out related plots side-by-side, with the axes either on the same scale or on different scales.

# use facet_wrap to stratify plots by one variable
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
  ggplot2::geom_point() +
  ggplot2::facet_wrap(ggplot2::vars(gear)) +
  lizard_style() +
  ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))


# use facet_grid for two variables
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
  ggplot2::geom_point() +
  ggplot2::facet_grid(rows = ggplot2::vars(gear), cols = ggplot2::vars(carb)) +
  lizard_style() +
  ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))

Note that the minimal style makes it difficult to distinguish the different blocks from each other. Readability of the figure is improved by plotting the axes for each plot, but they are removed by facet_grid and facet_wrap when the plots are on the same scale. The similar functions facet_rep_grid and facet_rep_wrap from the lemon package are more suitable with the lizard style, since they will show all axes.

library("lemon")

# use facet_wrap to stratify plots by one variable
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
  ggplot2::geom_point() +
  lemon::facet_rep_wrap(ggplot2::vars(gear)) +
  lizard_style() +
  ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))


# use facet_grid for two variables
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
  ggplot2::geom_point() +
  lemon::facet_rep_grid(rows = ggplot2::vars(gear), cols = ggplot2::vars(carb)) +
  lizard_style() +
  ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))

3.7 Treemap plots

Treemap plots are a powerful way of visualizing hierarchical data such as GO terms, for example. The size of the rectangles corresponds to the value you want to show (e.g. p value), and related rectangles are grouped together. This example makes use of the Pokémon dataset from the highcharter package and is based on https://yjunechoe.github.io/posts/2020-06-30-treemap-with-ggplot/.

In this example, more colors are requested than are present in the qualitative color scales. The biolizard_pal_hue palette was developped for situations like this. It is a specific version of the scales::pal_hue function which is used by default by ggplot2, that starts at Biolizard’s signature green.

The treemapify package allows plotting treemaps with ggplot2.

library(treemapify)
library(dplyr)

data("pokemon", package = "highcharter")
# Cleaning up data for a treemap
data <- pokemon |>
  dplyr::select(pokemon, type_1, type_2) |>
  dplyr::mutate(type_2 = ifelse(is.na(type_2), paste("only", type_1), type_2)) |> 
  dplyr::group_by(type_1, type_2) |>
  dplyr::summarise(n = length(pokemon)) |> 
  dplyr::ungroup()
#> `summarise()` has grouped output by 'type_1'. You can override using the `.groups` argument.

head(data)
#> # A tibble: 6 × 3
#>   type_1 type_2       n
#>   <chr>  <chr>    <int>
#> 1 bug    electric     4
#> 2 bug    fairy        2
#> 3 bug    fighting     3
#> 4 bug    fire         2
#> 5 bug    flying      13
#> 6 bug    ghost        1

# plot the treemap
ggplot2::ggplot(data, ggplot2::aes(area = n, label = type_2, fill = type_1,
                                   subgroup = type_1)) +
  ggplot2::ggtitle("Number of pokémon stratified by type") +
  # 1. Draw type_2 borders and fill colors
  treemapify::geom_treemap() +
  # 2. Draw type_1 borders
  treemapify::geom_treemap_subgroup_border() +
  # 3. Print type_1 text
  treemapify::geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 0.5, colour = "black",
                                         fontface = "italic", min.size = 0,
                                         family = "Lato") +   #need to add "Lato" manually as this text is not accessed in theme()
  # 4. Print type_2 text
  treemapify::geom_treemap_text(colour = "white", place = "topleft", reflow = T,
                                family = "Lato") +   #need to add "Lato" manually as this text is not accessed in theme()
  
  # 5. BLZ style
  scale_fill_biolizard(scheme = "hues") +
  lizard_style() +
  ggplot2::theme(legend.position = "none")

However, this is not ideal, since the font of the text still has to be defined manually, and thus the lizard_style() function does not work very well for these treemaps, even though they are ggplot2-based. Another option is to use the ‘treemap’ library, and move away from ggplot2 altogether, which generally leads to more aesthetically pleasing plots:

library(treemap)

# get colors from the biolizard hues palette
ncols <- length(unique(data$type_1))
palette <- biolizard_pal_hue(ncols)

treemap::treemap(data, index=c("type_1", "type_2"), vSize="n", type="index",
                 title="Number of pokémon stratified by type", 
                 palette=palette,  #set the BLZ color palette
                 fontcolor.labels=c("#FFFFFFDD", "#00000080"), bg.labels=0, border.col="#00000080",
                 fontfamily.title = "Lato", fontfamily.labels = "Lato", fontfamily.legend = "Lato")  #set the lato font

3.8 Chicago plots

Idea from “A ggplot2 Tutorial for Beautiful Plotting in R”.

chic <- utils::read.csv("https://figshare.com/ndownloader/files/42307179") #An example dataset

g <- ggplot2::ggplot(chic, ggplot2::aes(x = season, y = o3,
                                        color = season)) +
  ggplot2::labs(x = "Season", y = "Ozone") + 
  ggplot2::geom_violin(fill = "gray80", linewidth = 1, alpha = .5) +
  ggplot2::geom_jitter(alpha = .25, width = .3) +
  ggplot2::coord_flip()

g + lizard_style() +
  scale_color_biolizard()

library(corrr)
library(forcats)

corm <-
  chic |> 
  dplyr::select(temp, dewpoint, pm10, o3) |> 
  corrr::correlate(diagonal = 1) |> 
  corrr::shave(upper = FALSE)
#> Correlation computed with
#> • Method: 'pearson'
#> • Missing treated using: 'pairwise.complete.obs'

corm <- corm |> 
  tidyr::pivot_longer(
    cols = -term,
    names_to = "colname",
    values_to = "corr"
  ) |> 
  dplyr::mutate(
    rowname = forcats::fct_inorder(term),
    colname = forcats::fct_inorder(colname),
    label = dplyr::if_else(is.na(corr), "", sprintf("%1.2f", corr))
  )

g <- ggplot2::ggplot(corm, ggplot2::aes(rowname, fct_rev(colname),
                                        fill = corr)) +
  ggplot2::geom_tile() +
  ggplot2::geom_text(ggplot2::aes(label = label)) +
  ggplot2::coord_fixed(expand = FALSE) +
  ggplot2::labs(x = NULL, y = NULL) 

g + scale_fill_biolizard(type='continuous',scheme='divergent',limits = c(-1, 1), 
                         na.value = "white", name="Pearson\nCorrelation") +
  lizard_style() +
  ggplot2::theme(legend.position = "inside", legend.position.inside = c(.95, .8))

g <- ggplot2::ggplot(chic, ggplot2::aes(temp, o3)) +
  ggplot2::geom_hex(color = "grey") +
  ggplot2::labs(x = "Temperature (°F)", 
                y = "Ozone Level", 
                title = 'How Temperature Affects Ozone Levels',
                subtitle ='Hexagonal Binning of Ozone Levels vs. Temperature in °F')

g <- g + lizard_style() +
  scale_fill_biolizard(type = 'continuous',scheme = 'sequential')
g


finalise_lizardplot(g,source="Source: The Chicago dataset (https://www.cedricscherer.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/)")
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : no font could be found for family "Nunito Sans 10pt Medium"
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : no font could be found for family "Nunito Sans 10pt Medium"
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : no font could be found for family "Nunito Sans 10pt Medium"

3.9 Interactive plots (DE analysis)

ggplotly can be used to make a ggplot interactive, for example to use in a html report. See an example of in interactive PCA plot and volcano plot below. To generate these plots, we used the per-gene read counts of the pasilla dataset. Make sure that you do not apply lizard_style() to the ggplot call when you want to use ggplotly. The function lizard_layout() applies the lizard theme to a plotly object.

Note: to succesfully install Pasilla, you might have to manually install libbz2 (libbz2-dev (deb), libbz2-devel (rpm), bzip2 (brew)).

library(pasilla)
library(plotly)

# get gene counts
datafiles <- system.file("extdata", package="pasilla", mustWork=TRUE)
count.table <- utils::read.table(file.path(datafiles, "pasilla_gene_counts.tsv"), header=TRUE, row.names=1, quote="", comment.char="" )
head(count.table)
#>             untreated1 untreated2 untreated3 untreated4 treated1 treated2 treated3
#> FBgn0000003          0          0          0          0        0        0        1
#> FBgn0000008         92        161         76         70      140       88       70
#> FBgn0000014          5          1          0          0        4        0        0
#> FBgn0000015          0          2          1          2        1        0        0
#> FBgn0000017       4664       8714       3564       3150     6205     3072     3334
#> FBgn0000018        583        761        245        310      722      299      308

# get metadata
SampleAnno <- utils::read.csv(file.path(datafiles, "pasilla_sample_annotation.csv"))
head(SampleAnno)
#>           file condition        type number.of.lanes total.number.of.reads exon.counts
#> 1   treated1fb   treated single-read               5              35158667    15679615
#> 2   treated2fb   treated  paired-end               2         12242535 (x2)    15620018
#> 3   treated3fb   treated  paired-end               2         12443664 (x2)    12733865
#> 4 untreated1fb untreated single-read               2              17812866    14924838
#> 5 untreated2fb untreated single-read               6              34284521    20764558
#> 6 untreated3fb untreated  paired-end               2         10542625 (x2)    10283129

# DE analysis with edgeR
library(edgeR)

Group <- as.factor(c(rep("untreated", 4), rep("treated", 3)))
y <- edgeR::DGEList(counts = count.table, group = Group)

# filter genes
keep <- edgeR::filterByExpr(y)
y <- y[keep, , keep.lib.sizes = FALSE]

# normalize counts
y <- edgeR::calcNormFactors(y)
norm.counts <- edgeR::cpm(y, normalized.lib.sizes = TRUE, log = T)

# plot interactive PCA
mds <- limma::plotMDS(y, gene.selection="common", plot = F)
toplot <- data.frame(Dim1 = mds$x, Dim2 = mds$y, Group = Group, sample = colnames(mds$distance.matrix.squared))
pcaPlot <- ggplot2::ggplot(toplot, ggplot2::aes(Dim1, Dim2, colour = Group)) +
  ggplot2::geom_point() + 
  ggplot2::geom_text(ggplot2::aes(label = sample), size = 3, nudge_y = 0.05) +
  ggplot2::coord_fixed() +
  ggplot2::xlab(paste0("PC1: ", round(mds$var.explained[1]*100), "% variance")) +
  ggplot2::ylab(paste0("PC2: ", round(mds$var.explained[2]*100), "% variance")) +
  scale_color_biolizard() 

plotly::ggplotly(pcaPlot) |> lizard_layout()

# The lizard style is added through lizard_layout() on plotly instead of lizard_style() in ggplot

# fit general linear model
design <- stats::model.matrix(~ 0 + Group)
colnames(design) <- levels(Group)
y <- edgeR::estimateDisp(y, design)
fit <- edgeR::glmQLFit(y, design)

# fit contrasts of interest
my.contrasts <- limma::makeContrasts(treatedVsUntreated = treated-untreated,
                                     levels=design)

lrt <- edgeR::glmQLFTest(fit, contrast = my.contrasts)

DE <- edgeR::topTags(lrt, n="all")$table
colnames(DE) <- c("log2FoldChange", "logCPM", "LR", "pvalue", "padj")
DE <- DE[order(DE$padj), ]
DE <- DE[!is.na(DE$padj), ]
head(DE)
#>             log2FoldChange   logCPM        LR       pvalue         padj
#> FBgn0039155      -4.605077 5.882320 1032.5624 3.519114e-11 2.786786e-07
#> FBgn0025111       2.906975 6.925429  718.6612 2.001949e-10 7.926717e-07
#> FBgn0029167      -2.189661 8.222828  527.6780 8.769068e-10 2.151196e-06
#> FBgn0035085      -2.549919 5.685708  504.5058 1.086600e-09 2.151196e-06
#> FBgn0003360      -3.171310 8.451316  467.1444 1.567356e-09 2.482378e-06
#> FBgn0039827      -4.149923 4.399320  412.0585 2.861632e-09 3.476204e-06

# interactive volcano plot
p <- DE |> 
  ggplot2::ggplot(ggplot2::aes(x = log2FoldChange, y = -log10(pvalue))) +
  ggplot2::geom_point(ggplot2::aes(color = padj < 0.05)) +
  scale_color_biolizard(reverse = TRUE)

plotly::ggplotly(p) |> lizard_layout()

3.10 Maps

note: to install ‘sf’ you might have to install libudunits2 and libgdal:

  • debian/ubuntu: sudo apt-get update && sudo apt-get install libudunits2-dev libgdal-dev
  • tophat/fedora: dnf install udunits2-devel libgdal-devel
  • OSX: brew install udunits & brew install gdal
library("sf")
library("rnaturalearth")
library("rnaturalearthdata")

world <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")

BLZ_offices <- c("Belgium", "Netherlands", "Switzerland", "United States of America")
world$BLZ_office <- ifelse(world$name_en %in% BLZ_offices, "yes", "no")
world$BLZ_office <- factor(world$BLZ_office, levels = c("yes", "no"))

ggplot2::ggplot(data = world, ggplot2::aes(fill = BLZ_office)) +
  ggplot2::geom_sf() +
  scale_fill_biolizard(type = "discrete", scheme = "qualitative") +
  lizard_style()

# coloring the ocean, as in the map on our website
ggplot2::ggplot(data = world, ggplot2::aes(fill = BLZ_office)) +
  ggplot2::geom_sf() +
  ggplot2::scale_fill_manual(values = c(blz_yellow, blz_green)) +
  lizard_style() +
  ggplot2::theme(panel.background = ggplot2::element_rect(fill = blz_blue),
                 panel.grid = ggplot2::element_blank())

# europe only
ggplot2::ggplot(data = dplyr::filter(world, continent == "Europe"), ggplot2::aes(fill = BLZ_office)) +
  ggplot2::geom_sf() +
  ggplot2::scale_fill_manual(values = c(blz_yellow, blz_green)) +
  lizard_style() +
  ggplot2::theme(panel.background = ggplot2::element_rect(fill = blz_blue),
                 panel.grid = ggplot2::element_blank()) +
  ggplot2::coord_sf(xlim = c(-25, 50), ylim = c(35, 70), expand = FALSE)



BioLizard Logo
BioLizard Logo